Name: Yuzi Li
Student Number: 1005131947
Journal Entry: Assignment 3 Journal
Data was downloaded from GEO with id GSE161243. Two gene expression experiments were conducted in the study, one on identifying the effect of CPT1A gene knockdown in LNCaP-C4-2 prostate cancer cells, the other identifying the effect of CPT1A gene overexpression in prostate cancer cells. The replicates of the same control/test conditions are assigned the same cell_type in the samples data frame: “KD_C”, “KD_S”, “OE_C”, “OE_S”, with “KD” standing for knockdown and “OE” standing for overexpression. “_C” are control samples, and “_S” are experimental samples. These 4 different cell types were indicated for TMM normalization.
Data was filtered and then normalized using TMM normalization, and sample variance was visualized using an MDS plot. The sample groups data appear to be well separated, and there appears to be low technical variation.
Differential gene expression analysis was conducted using BH correction. I used the benjamini-hochberg method for p-value correction because it is not overly stringent and is the standard method.
From thresholded over-representation analysis by G:Profiler, I see that in the CPT1A knockdown experiment, many up-regulated genes in CPT1A knockdown cells are involved in different mitosis-related events as well as cell cycle checkpoint signaling. Specifically, I noticed that some of these genes are involved in double-strand breaks repair and break-induced replication. Down-regulated genes in CPT1A knockdown cells are more related to amino acid biosynthesis and positive regulations on translation and transcription in response to stress.
In the CPT1A overexpression cells, many up-regulated genes are associated with regulation of ubiquitin-mediated protein catabolism and energy metabolism (glycolysis, gluconeogenesis, etc.). Processes like ribosome production and translation are down-regulated in CPT1A overexpression cells.
First, I installed some related packages:
#install required R and bioconductor packages
tryCatch(expr = { library("RCurl")},
error = function(e) { install.packages("RCurl")},
finally = library("RCurl"))
#use library
tryCatch(expr = { library("ggplot2")},
error = function(e) { install.packages("ggplot2")},
finally = library("ggplot2"))
#use easy cyRest library to communicate with cytoscape.
tryCatch(expr = { library("RCy3")},
error = function(e) { BiocManager::install("RCy3")},
finally = library("RCy3"))
Then, I prepared the non-thresholded gene lists for both CPT1A knockdown and overexpression experiments using the following code:
qlf_hits_kd[,"rank"] <- -log(qlf_hits_kd$PValue, base=10) * sign(qlf_hits_kd$logFC)
qlf_hits_oe[,"rank"] <- -log(qlf_hits_oe$PValue, base=10) * sign(qlf_hits_oe$logFC)
write.table(x=data.frame(genename=row.names(qlf_hits_kd),F_stat=qlf_hits_kd$rank),
file='cpt1a_kd.rnk',sep = '\t',
row.names = FALSE,col.names = FALSE,quote = FALSE)
write.table(x=data.frame(genename=row.names(qlf_hits_oe),F_stat=qlf_hits_oe$rank),
file='cpt1a_oe.rnk',sep = '\t',
row.names = FALSE,col.names = FALSE,quote = FALSE)
I used GSEA for non-thresholded gene set enrichment analysis because we learned to use this in a previous homework assignment, and GSEA is very user-friendly and lets users specify custom genesets as well as other detailed settings.
Settings used:
Ran the GSEA analysis in R using the following code:
gsea_jar <- '/home/rstudio/GSEA_4.2.3/gsea-cli.sh'
java_version <- '11'
working_dir <- getwd()
analysis_name_oe <- 'cpt1a_oe'
analysis_name_kd <- 'cpt1a_kd'
rnk_file_oe <- "cpt1a_oe.rnk"
rnk_file_kd <- "cpt1a_kd.rnk"
dest_gmt_file <- 'Human_GOBP_AllPathways_no_GO_iea_March_01_2021_symbol.gmt'
command <- paste("", gsea_jar, "GSEAPreRanked -gmx", dest_gmt_file, "-rnk",
file.path(working_dir, rnk_file_oe), "-collapse false -nperm 1000 -scoring_scheme weighted -rpt_label ",
analysis_name_oe," -plot_top_x 20 -rnd_seed 12345 -set_max 200 -set_min 15 -zip_report false -out" ,
working_dir, " > gsea_output.txt",sep=" ")
system(command)
command <- paste("", gsea_jar, "GSEAPreRanked -gmx", dest_gmt_file, "-rnk",
file.path(working_dir, rnk_file_kd), "-collapse false -nperm 1000 -scoring_scheme weighted -rpt_label ",
analysis_name_kd," -plot_top_x 20 -rnd_seed 12345 -set_max 200 -set_min 15 -zip_report false -out" ,
working_dir, " > gsea_output.txt",sep=" ")
system(command)
Top gene sets returned for CPT1A overexpression:
For up-regulated gene sets in CPT1A overexpression:
For down-regulated gene sets in CPT1A overexpression:
For up-regulated gene sets in CPT1A knockdown:
For down-regulated gene sets in CPT1A knockdown:
While most results from non-thresholded analysis agrees with the ones from thresholded analysis and are straightforward to compare, some top gene sets are not seen before in the thresholded results. This sometimes is due to the fact that different pathway datatbases were used. When the non-thresholded analysis returns top hits from MySigDB, which is not included in the thresholded analysis, it might be a bit harder to compare the results as the gene set descriptions can be quite different. However, upon further inspection, most top hits do agree with the thresholded results. A few surprising hits can also come up in the non-thresholded analysis, possibly because the strict threshold used in thresholded analysis is excluding too many genes than it should.
I will only do analysis on the CPT1A Overexpression experiement in the following sections as the article focused more on the overexpression experiment, not the knockdown.
How many nodes and how many edges in the resulting map?
There are 167 nodes and 750 edges in the CPT1A Overexpression enrichment map.
What thresholds were used to create this map?
Figure 1. Screenshot of network prior to manual layout. Red nodes are up-regulated.
I set the maximum number of annotations to 12 because I do not want to annotate the clusters that are too small. This way it is computationally less expensive and also gives a clearer focus to the map.
Figure 2. Parameters for autoannotations.
Figure 3. Clusters are calculated and annotated by AutoAnnotate. All nodes are red (up-regulated).
I created this figure by checking the publication-ready option in the Enrichment Map panel and then exporting the image. I also pasted the legend to the bottom right corner.
Figure 4. Publication-ready figure. All nodes are red (up-regulated).
I used the “Create summary network” function in autoannotate to create the theme network, in which only the collapsed clusters are included (all other nodes are ignored).
Figure 5. Collapsed theme network. All collapsed nodes (geneset clusters) are red (up-regulated). Very few of the clusters are interconnected.
Do the enrichment results support conclusions or mechanism discussed in the original paper? How do these results differ from the results you got from Assignment #2 thresholded methods
It indirectly supports the conclusions of the original paper. The original paper concludes that CPT1A overexpression increases mitochondrial ROS, and here we can see some downstream pathways affected, such as unfolded protein response and proteasomal catabolism, both of which also seen in the results from assignment 2 thresholded methods.
Many other enriched gene sets came up here are also seen in the thresholded analysis, such as glycolytic process and gluconeogenesis (the 2 major clusters of enriched pathways). Some unexpected clusters of gene sets that are not seen in the thresholded analysis also came up, such as neural tube closure and maturation virion assembly.
Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your result?
Palmitoylation of some viral proteins are essential for virion assembly, as discovered by multiple research groups (Veit 2012, Sobocińska et al. 2018). Because CPT1A is essential for beta oxidation, fatty acid biosynthesis is affected by overexpression of CPT1A, this could potentially cause the cells to produce more palmitoyl-CoA, which helps increase palmitoylation of some viral proteins.
As mentioned in the article, CPT1A has been identified as a potential therapeutic target for a growing list of cancers. Therefore I chose the approved drugs gene sets to investigate which pathways up-regulated by CPT1A overexpression can be targeted by drugs.
Figure 6. Gene sets selection and parameters for post-analysis. A one-sided Mann-Whitney test was used with 0.01 cutoff in the approved drugs database from Bader lab.
Figure 7. Post-analysis results added to the annotated clusters of genesets up-regulated by CPT1A overexpression. Most drugs target the glycolytic process cluster. Some target the gluconeogenesis cluster, while other clusters are targeted by very few approved drugs.
Zhu Y, Davis S, Stephens R, Meltzer PS, Chen Y (2008). “GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus.” Bioinformatics (Oxford, England), 24(23), 2798–2800.
Chen Y, Lun AAT, Smyth GK (2016). “From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.” F1000Research, 5, 1438.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W (2005). “BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis.” Bioinformatics, 21, 3439–3440.
Isserlin R (2022). “BCB420 lectures”. University of Toronto.
Gu Z, Eils R, Schlesner M (2016). “Complex heatmaps reveal patterns and correlations in multidimensional genomic data.” Bioinformatics.
Gu Z, Gu L, Eils R, Schlesner M, Brors B (2014). “circlize implements and enhances circular visualization in R.” Bioinformatics, 30, 2811-2812.
Stankovic-Valentin, N., & Melchior, F. (2018). Control of Sumo and ubiquitin by ROS: Signaling and disease implications. Molecular Aspects of Medicine, 63, 3–17. https://doi.org/10.1016/j.mam.2018.07.002
Miguel, V., Tituaña, J., Herrero, J. I., Herrero, L., Serra, D., Cuevas, P., Barbas, C., Puyol, D. R., Márquez-Expósito, L., Ruiz-Ortega, M., Castillo, C., Sheng, X., Susztak, K., Ruiz-Canela, M., Salas-Salvadó, J., González, M. A., Ortega, S., Ramos, R., & Lamas, S. (2020). Renal tubule CPT1A overexpression protects from kidney fibrosis by restoring mitochondrial homeostasis. https://doi.org/10.1101/2020.02.18.952440
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.
Merico, D., Isserlin, R., Stueker, O., Emili, A., & Bader, G. D. (2010). Enrichment map: A network-based method for gene-set enrichment visualization and interpretation. PloS One, 5(11), e13984.
Veit, M (2012). “Palmitoylation of virus proteins.” Biology of the cell, 104(9): 493-515.
Sobocińska J, Roszczenko-Jasińska P, Ciesielska A, Kwiatkowska K (2018). “Protein Palmitoylation and Its Role in Bacterial and Viral Infections.” Frontiers in Immunology, 8, 1664-3224.